Baltics Technical' bulletin Mf3-220 

To: Distribution 

from: A. Kobziar 

Date: 10/10/7.5 

Subject: :iew Storage System: Data Recover/ (Part 1 - Salvaging) 



Introduction 

The following discussion introduces the framework, for the new 
storage system data recovery design. Specifications for the new 
storage system were given in Mr3-110. This bulletin is concerned 
with the part of the data recovery task currently termed 
"salvaging." (The remaining part is backup and retrieval.) Data 
recovery nechanisms exist because of imperf ections, ootn in 
hardware and software. The reason for a salvager redesign is to 
increase two important Multics attributes, availability and 
reliability. Availability implies that stored data should always 
be dynamically accessible at the demand of any user, while 
reliability implies no loss of stored data as well as tne safe 
storage of the security information used to orotect tne stored 
data. 

This MT3 proposes a major cnange to today's salvaging operation. 
To accommodate storage growth, salvaging will becone dynamic and 
distributed. More of the errors corrected by tne salvager will 
become user visible. An implementation olan which chronicali zes 
tne design decisions still to be made is given in tne companion 
MF3-221 "ijew Storage System Salvager Implementation." In order 
to explain why this MT3*s design is being orooosed, some 
background material is presented first. 

Storage System Overview 

is a first order approximation, the Multics storage system can be 
viewed as a logical organization for an array of file maos. This 
logical organization can Pe broken into' two oarts: directory 
control and storage control. Directory control nandles the 
logical structuring of tne user data and stores the security 
information. A directory consists of oojects (branches, names, 
acls, etc.) whose data is held in structures. Relations aiong the 
objects are implemented by threading the structures together. 
Storage control nanages tne file man arrays. A file nap is 
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than an array of onysically sequenced '<eys to the 



Stored data can be "found" oy only one 
traversing through a hierarchial structure o 
directories nave internal structure, a 
traverse requires a physically correct intern 
nave been caused oy hu.nan, probabilistic 
cosmic (unknown, such as lightning causin 
actions. Jeeause only the human cause 
(theoretically) oe eliminated, error dateetio 
necessary., The necnanism used for this purp 
offline salvager. The system is crashed upon 
error and correction is achieved oy runnin 
naking trie systei unavailaole for useful work 
is necessary today, but its cost is too 
provided, since most of the directories salva 
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dew Storage System Structure 
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System (-JSS) splits current directory oranenes 
the logical attributes and the ohysical 
e onysical attributes are stored in a 
format, tne volume table of contents entry 
ntains a uid oathname. The connection between a 
nch and a vtoce is logical in both directions. 
herently nore costly to process for the salvager 
the storage system because two disk references, 
n, the other for the vtoce, are often necessary 
d previously. 



D rojecting tne present salvager's operation into a .\I3 3 format 
gives running time estimates of 5 hours for a 133 disk drive 
system. Performing tne same operation with a multi-process 
salvager could cut tnis time down by 1/2 to 1/10 depending on the 
hardware configuration. Unfortunately, the near future capacity 
doubling of the :<lSJ3430s makes even the multi-process aporoacn 
unacceotaole. 



Current directory control is coded witn the assumption tnat 
threads and relative pointers are always valid. Thus a orief 
description of the salvager's action would be that its primar/ 
ouroose is only to prevent faults on thread and relative oointer 
references. A walkthrough of tne salvager code reveals tnat 
directory control relies on few of the other parts of a directory 
object's structure. Other benefits derived from salvaging are 
garbage collection (directory compaction and tne freeing of soace 
used by process directories and uardcore segments), and quota 
verification. Some cross-checking on acl structures and 
access-class relationships is done in an attempt to establish 
secure t ,r non— Comoro mi se 
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Trie salvager also checks for reused addresses by recording all 
oage assignments in a new free storage map which replaces t-.h* .~>i .a 
one at the end of 3alva?in?. This task has been split oat of the 
directory salvager by the ,-j33 design, since every volume (disk 
oacK) now contains its own map. h salvage ooeration over a volume 
will oe performed ov a volume salvager, to be described later. 

Terminology 

defore proceeding any further, definitions must oe given for the 
terms used. The term "salvaging" is misleading in its innate 
description of the current code's function, since an operational 
descriotion is mostly "directory checking" with correction 
occurring infrequently. When "salvager" is used it it will refer 
to today's operation. For the proposed design, compound terns 
will oe used to nore clearly indicate which operations are Deing 
discussed. 

"Ji rectory checking" is defined as that code whicn detects errors 
in directories. "Directory salvager" defines that code wnicn 
corrects and compacts directories. "Connection oheccing" refers 
to that code which checks branch-vtoce connections. " "Volume 
salvager" refers to that code which performs garoage collection 
on a volume. " ' " " 

Specifications for the Jirectory Salvager 

The directory salvager is first viewed as a plaek box witn 
inout, output, and environmental specifications. The following 
describes the input and output constraints: 

1. Tne inout is a bit string and some (read only) context 
predicates. 

2. Tne output is a valid directory (tnis includes a njil 
directory). 

3. liven a valid directory as input, the output is tne sane 
valid directory. This can oe called the non-destruction 
rule. 

a. Optionally, given a valid directory and a length as 
inout, the output is a valid directory which includes 
the inout valid directory as i subset. tfalid objects 
that exist within the input length will be connected to 
the appropriate places. This is the reclamation 
option. 

4. If an invalid directory object is found, tnen it will be 
changed to be valid only if no security compromise can 
occur. If it is changed then a possible loss" of correctness 
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will be indicated,, Otherwise, it will be discarded. This 
is tne object acceptance criterion and the inverse is trie 
discard criterion. 

A few words about the use of the words "validity" and 
"correctness" are in order. An object is correct if its data has 
not changed by any means other tnan ay user calls to storage 
systen entry ooints that are provided specifically to change tne 
data. (Correct data is simoly data that has not been clobbered 
by the system.) Only certain correctness losses are detectable. 

An object is valid if its structure conforms to tne rules that 
are implicitly given oy the storage system implementation. \ 
minimum set of validity rules is defined ay a particular 
implementation. The directory salvager can, of course, validate 
all oossible structural parts, guaranteeing validity irrespective 
of any storage implementation changes. This extra checking 
guarantees that all errors (cloboerings ) which span ootn data 
and structure will, if possible, be detected. Tnus tne 
probability of detecting correctness losses is increased. 

Design Basics 



Clea 

erro 

cost 

asid 

one 

cons 

will 

dire 

and 

cann 

time 



rly , i 

r s in t 

of t 

e for t 

direec 
iderati 
i n g to 
c t o r i e s 
would c 
ot affo 

i ncrea 



1 s 



on 



hem. All 
ne ssrvi 
he moment 
tory tha 
on of rel 

spend 
, in tne 
ause eras 
ri such a 
ses linea 



ly neces 
other re 
ce. If 
, i t rfou 
t ca u s e 
iability 
tne pro 
belief t 
hes sno 
ction on 
rly with 



sary 

ou i 1 

tne 

Id b 

d a 
and 

cess 

hat 

rtly 
lar 
the 



to re 
ding i 
goal o 

e acce 
crash 
garba 

ing t 

otner 
i nto 

ger hi 
size 



build 
s rfas 
f ser 
otabl 
. In 
ge co 
ime r 
i neon 
the 
erarc 
of th 



dire 
teful 
vi ce 
e to 

the 
llect 
equir 
s i s t e 
nex 
hies 
hie 



ctor 
and 
coat 
rebu 
cu rr 
ion 
ed t 
ncie 
t b 
oeca 
rare 



les 

add 
inui 
ild 
ent 
had 
o sa 
s mi 
ootl 
use 



that 

3 to 

ty is 
only 
salva 
made 
lvage 
ght e 
oa J. 
salva 



nave 

the 

out 

tne 

ger, 

us 

all 

xi st 

,Je 



A deeoer look into the use of the current salvager at external 
sites reveals another purpose, that of restoring the confidence 
level in -in "intact" hierarchy back to 1004. (dere "intact" is 
used to convey the ideas of correctness and validity.) 

It is proposed that directory structure checking oeeome an 
integral feature of the online operation of directory control and 
tnat the notion of a separate salvager subsystem oe dropped. 
Srror detection will be done dynamically, and corrections will be 
done by online rebuilding. Dynamic checking can oe visualized as 
tne breaking up of the current salvager into two parts, 
scattering tne checking function throughout directory control, 
and retaining the directory rebuild function. A salvage of the 
entire hierarchy will still oe possible, but will be rarely used. 



Tne economics of dynamic checking indicate that it will oe more 
expensive than today's offline strategy on stable hardware 
configurations with small disk complements. Part of this cost can 
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Environment 
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As insurance against non-random errors that are not detected by 
the directory salvager, a small array of invocation times and 
errors found vill be keot in every directory header. If the 
directory salvager is invoked too frequently, it will inform the 
operator that a oossiole loop exists. A review of the errors 
found should help in determining whether nardware or software is 
suspect. 
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Distributed Cheeking 

3o far the directory salvager nas been viewed as a black dox. The 
following section descrioes tne soecific checks and structure 
changes that are proposed. 
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end of each structure a checksum field and an owner field 
be added. Tne owner field will contain tne directory uid 
rectory threaded objects, and tne entry uid for entry 
ed objects. A length field and an object type field will be 

to the front of eacn ooject. each tnreaded list will nave 
ue count of the numoer of members in tne list. Since this 
ready true for all lists excaot initial acls, the initial 
tal count will be replaced by an array of individual list 



Therefore all directory objects will have tne following format: 



Based on directory structure statistics at 1IT, the prooosed 
structure modifications would increase an average director/ oy 

b',B . 

The following checks can oe made oy directory control. Sacn 
check can be made indeoendently of the others; thus tne final 
installation will have tne most viable combination, as determined 
by c ost/oerf ormance studies outlined in MTB-221. 

1. Change all storage system procedures that calculate relative 
addresses to cneck if the address is before using it. In 
most cases this can oe done by adding one instruction wnen 
oic^cin^ up relative oointers. ?or prooer ooeration, the end 
of a valid t bread will be some value otner than 3. 



All directory objects will 
structures. All referenc 



id in 



their 



structures. All references to a thread pointer or to 
item itself, will first check for the correct type val 
This cneck will probably add three instructions to ev 
reference. In : i similar manner, tne length and owner fie 
can be checked. 



have a typ-: 

a tnread pointer or to the 
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For all directory objects, tenolates are forced which test 
all the constant bits. ?or AJCII data this would translate 
to testing the first two bits of each character to be 0. For 
directory headers md entries this would translate to 
testing that all oad fields remained J. Inese checks can oe 
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imolemented via extended instruct ions and a test. T-molat- 
cnecxing will first oe used by tue directory salva^r" ^n 

4. finally, each oojeet will nave a cneeksum stored alon^ H i tn 
its data. A checksum calculation would take as many 
instructions as the length of the object plus two ;iore for a 
conoanson and transfer. The cost of storing a cheeksun wnen 
an ooject is created or changed is negligible if we assume 
tnat the nunoer of directory references is much greater than 
trie numoer of directory modifications; thus it snould be 
calculated along with each modification and used bv the 
directory salvager. Checksums will oe calculated for only 
tne relatively constant data fields in a structure not for 
items such as date-time-used. 

An easily implemented installation option would be to tenolate 
and/or checksum all access names during an access mode 
calculation. Access_mode already references exceotion bits in th- 
ods and tnus would require only two or three extra instructions 
to check another exceotion bit. 

ACL Srrors 

4hen_an acl error occurs, the current directory strategy f 
snaring access names creates oroblems in retaining any 
information fron the valid acl entries. All acls that shar* a 
oad access name must oe deleted as no secure method exists for 
Dro.ectin? tne integrity of an acl. It is orooosed that an acl 
out-ot-service condition oe supported by the storage syst-m 
4eoiacement of tne acl would be required to turn on se^vic- 3ut 
tne name sharing strategy has often Droduced many (if not all) 
invalidated acls in a directory. Thus multiole corrections for 
one error are required. If acl errors are frequent enough in OS 
tnen snaring of access names should oe dropoed. This change 
would also nave the beneficial effect of localizing all oranch 
attributes, tnus reducing page faults. Snaring within a brancn 
would still oe supported. 

Tne 2ost of not snaring acl names is ratner hign, with an average 
increase m directory size of 45%. Even if tne savings obtained 
iron reduced page faults (due to the localization of the acl) are 
included, tne sun will still snow a cost increase. Recovery of 
tne cost increase could be accomolisned by imolementin^ variaoi- 
size nash tables. 3 " 

Directory Control Cnanges 

As well as type and owner field checking, certain bounds and 
cross checks of structure values will be added to directory 
control wnen it is in the explicit interest of a oro-edure to 
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decrease its gulli oi lity. Several cheeses of this kind already 
exist; for example when acls are listed, trie nunber found in the 
acl list is compared to the count in the Drancn. 

Currently the count of sharers xeot witn each acl name- is used to 
allow deletion of the name. If the count was incorrect, then 
reassignment of the name slot would change a oerson or oroject 
name on several acls. ?or this reason, it is orooosed thit access 
names not be freed until a rebuild is performed. A reference to a 
name witn a zero sharing count will be one more form of error 
detection. 

Directory Allocation 

A simplification to the directory allocation scheme is proposed. 
Instead of maintaining several different fixed size free lists, 
all allocation requests will be placed at the end of a director/. 
Slots that are freed will be zeroed and not reused. A total of 
the freed space will oe kept so that the necessity of compaction 
can be determined. This strategy nas the desired effect of 
isolating the introduction of errors. i?or example, a new branch 
-11 have its names and acls onysically as well as logically 



;>iow that allocation can accept any reasonable size request, linxs 
can be stored more eomoactly. Also tne introduction of new 
objects into a directory need not consider the current block size 
limitations. Changing sizes of current structures is also 
facilitated. Imolicit in this suggestion is that directory space 
management will be done inline by directory control oecause it is 
so simple. If variable size allocation is adapted, then the 
proposed new directory structures would only increase an average 
directory's size by 5% rather than 5%. If variable size nash 
tables are implemented, then a net size decrease of 3u& could oe 
achieved. 

Triggering Rebuilds 

Whenever the freed soaee total and the count of directory 
attribute modifications exceeds some threshold, the directory 
salvager will be invoked to perform a rebuild. Here we have 
achieved tne desired orooerty that tne more a directory is 
modified, the more often it is validated. Et is not necessary to 
count directory read operations because the new storage system 
design does not require any directory modifications in order to 
r ea i (search) director i e s . 
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For dynamically detected errors, tne mechanism used to trigger 
the directory salvager is tne following: Whenever a directory is 
locked, a handler for the »invalid_di rector/- condition is set up 
to call tne directory salvager. After trie reouilding, control is 
transferred to the statement following the lock call, thus 
repeating the function on the rebuilt directory. Internal 
directory control procedures need only signal whenever an error 
is found. All procedures whica lock directories must be checked 
for code which will operate properly ..men restarted after the 
locic call - for instance, variables assigned before the loc< call 
cannot be reassigned after the lock call. 

3rror Reporting 

Tne methods used today for reporting errors detected by tne 
salvager are inadaquate. Of significant concern to users are 
missing branches, bad names, and lost acts. Although tne 
salvager prints all detected errors, no method exists for 
distributing these messages or issuing warnings, mere is also a 
problem in deciding who should receive the messages. 

Both the directory salvager and the volume salvager will use the 
syserr mechanism for recording errors, as the syserr log is the 
permanent record of system events (especially detected failings)." 
The log can be processed online in order to detect error patterns 
and maybe even predict hardware failures. To reflect errors to 
users, flags will be set. 3a d names and missing branch flags 
will oe set in the directory neader wnile an invalid acl flag 
will oe set in the branch. The current action of deleting 
invalid acls will be changed to retain tne acl for listing 
ourooses. Directory control will treat the invalid acl flag as if 
tne acl was null (acl out-of-service) . Tne invalid acl flag can 
be turned off ay either deleting the entire acl or replacing it. 
■io action for missing names and Drancnes is currently planned, as 
these are relevant to Multics search rules and could affect every 
process. for example, if a name was missing in >sss tnen every 
process mignt be stopped until the flag was reset. In tne future 
sucn errors could become visible by changing the search of suen 
directories to signal some condition. The default action would 
be to ignore tnis signal. 

Storage Control 

Storage control errors are handled by a volume salvager. The 
inout and output specifications for tne volume salvager are as 
follows: 

1. The incut is the string of bits that comprise a volume. 

2. The output is a valid new Storage .System format volume. 

3. Given a valid storage volume, tne same storage volume is 
returned. 
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4. If an invalid vtoce is found, it will be deleted. If a 
reused address is found and the vtoce aoo^ars to b* a 
directory, the volume salvager will invoke the directory 
salvager on this directory. If no errors are found, then 
the cage in question will be awarded to the directory and 
trie vtoce set out-of-service. Turning on service to such a 
directory must be performed oy an administrator. 

The volume salvager environment is also similar to the directory 
salvager's. Eacn establishes exclusive control over its subjects 
(in this case a disk oac«). Also each relies on correctly 
functioning lower level mechanisms. ?op the volume salvager this 
is disk i/o. In the final imolementation, the volume salva^r 
snould gain control of trie volume via RCP. But for now, a direct 
path to tne disk dim will oe used. 

Disk Pac-o 

To aid in checking vtoces, tneir structure will be extended to b- 
similar to that of directory objects. Tne vtoce checksum will 
cover only the aid pathaname and the access-class, as other vtoce 
fields change too frequently. 



Since only a limited reused address ehec< can be made by pa^e 
control (tne user of vtoces), volume checking would normally 
occur infrequently. Tnerefore triggering the volune salvager nas 
to oe accomplished artificially. One installation ootio'n would 
be to salvage at the time the dis< is logically connected. Tnis 
might be judged too costly (1-3 ain. oer ,I3J0400), so 
scheduled volume salvaging could be imolemented 
periods. 



for slack 



that 
time 



As well as salvaging all vtoces, tne volume salvager will 
reconstruct tne volume mao and will cneex for reused addresses. A 
reused address involving a directory will be resolved by ascitic 
the directory salvager if any errors were found in salvaging t h» 
oages that include tne reused address data. A finding' of no 
errors would result in awarding tne page to that director/ If 
errors were found, then the rebuilt version of the directory 'with 
a zero oage would replace the bad one, and a retrieval request 
for that directory issued. (A no re detailed descri otion^of 
directory retrievals will be given in tne backup 113.) 
address on a segment would result 
(equivalent to zeroing), 



an 



A reused 
■sun in a null address award 
out-of-service indication, and a 
retrieval request. A second pa3s over tne volume will be mad^ tc 
handle any directories that were the first 
addresses. 



claimants of 
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reused 



Sraneh - iffOSe Connection 



Tno 



iew storage system design includes the dynamic checking of 
the logical connection between tne oranch and vtoce at activation 
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time. Th; 
f o 1 1 ow s : 



resolution of an error at this time 



should be as 



1. 



One 



oiec< the vtoce checksum. If it is correct then sark th« 
pranon as unconnected. A user encountering an unconnected 
orancn can either delete it or issue a retrieval request for 
its v.oce. a future addition might oe to allow scanning of 
volumes for an unconnected vtoce entry with the matching uid 
and upon rinding one, connecting t ne orancn to it. 



If the checksum is wrong, then nark the 
and issue a retrieval request for 
referencing the branch would receive 
error immediately and could try again at 



future extension should be mentioned, 
retrieval is performed, instead of reolacin 
toto, the version from backup and the exist 
logically coalesced. This would prevent loss 
any case, notice that although a retrieval 
deleted branches, the correct action is ta*e 
a connection mismatch is detected. 



vtoce out-of-service 
that vtoce. Tne user 
the out-of-service 
some later time. 

Whenever a directory 
g the contents in 
ing version could be 
of new branches. In 
can return already 
n at activation wnen 



Looos 

In tne effort to oreserve all possible information, we have 
r a J, not t0 d9l8te objects, but to mark them as having errors, 
and allowing users to issue retrievals. Unfortunately, tnare is 
no guarantee that the retrieved information is correct - in fa-t 

\l,T4 m *VT T 9 . 9rror ' lnis is a loop whicn only a user can 
detect. me resolution is that, if necessary, a 
retrieval should oe tried, ad infinitum. 



orevi. ous 



:opy 



exists 



An apparent loop also 

address processing. Assume 
reused address when processing a 
directory salvager for some advice. 
in formulating its opinion, 



in the specification of reused 

that the volume salvager detects a 

directory vtoce. It as=cs the 

But the directory salvager, 

-(--"-->., can get a reused address si~nal Q d 

iron oage control, and this would invoke tne volume salvager' 
xnis sequence is prevented from happening if we insure that 'all 
addresses in a Particular directory are unique (done oy tn- 

oont^i Tr t aS ^- ^ that th9 V0lune aalvasep has exclusive 
control of tne dis* oac< (tnus oage control cannot signal a 
reused address on it). 



Loose inds 

Puroosely_ saved until the end is the subject of quota validation. 
ihe elimination of offline salvaging implicitly drooped this 
function, since it can only be done on quiescent suotr^es It 
could be performed online only if there was 
was the only orocess looking 
achieve this would oe to 



a guarantee that this 
at the subtree. One approach to 



turn on security out-of-service f 



or 



all 
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components in tne subtree. Jnoe we assume or ta'<e action to 
provide exclusiveness, a procedure whicn seta tne used values in 
an aste must be orovided. It is proposed that quota validation 
become a oart of tne administrative mechanisms used in 
determining volume usage charges. 

tfhile on the subject of charges, notice that the directory 
control checking design has transformed the collective aggregate 
cost of offline salvaging into a process assigned "Day as you 
use" oart of trie storage system. For pnysical volumes that are 
wholly owned oy projects, even the use of tne volume salvager as 
the garbage collection device is automatically cnarged to tne 
correct project. 



3 urn nary 

1. The salvager is split into thre 
wnich rebuilds direct"" 4 ~~ 



Darts: a directory salvager 



ne salvager x a s^ili, j. u i-u uhic= j n ^ j . a j^. i -<^ >- ^» y -> * -■- ■ ■•* ■, ■- > 
nich rebuilds directories, scattered checking in directory 
ontrol, and a volume salvager whien checks for reused 
ddresses and rebuilds the volume map. 

Detected errors are entered in the syserr log, and users are 
notified of errors by out-of-service conditions and error 
bits in the directory header. 

3 i rectory structures are exoanded to oe more rooust and tne 
directory allocation scheme is changed to ta<e advantage of 
the directory salvager. Tne costs for an average directory 
are as follows: 



structure changes - *d.i 

variaole allocation - -j£ 

non-shared access names - <-4o> 

variable size n a s h t a o 1 e - - 3 3 ,» 



( end) 



